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Abstract. The relationship between algebraic geometry and the inferential frame- 
work of the Bayesian Networks with hidden variables has now been fruitfully explored 
and exploited by a number of authors. More recently the algebraic formulation of Causal 
Bayesian Networks has also been investigated in this context. After reviewing these 
newer relationships, we proceed to demonstrate that many of the ideas embodied in the 
concept of a "causal model" can be more generally expressed directly in terms of a partial 
order and a family of polynomial maps. The more conventional graphical constructions, 
when available, remain a powerful tool. 
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1. Introduction. There has been much recent interest in the study 
of causality based on graphs, e.g. [4] [TJ3 [16j [26] . A most common scenario 
studied is when the observer collects data from a system and wants to make 
inferences about what would happen were she to control the system, for ex- 
ample by imposing a new treatment regime. To make prediction with such 
data she needs to hypothesize a certain causal mechanism which not only 
describes the data generating process, but also governs what might happen 
were she to control the system. Pioneering work by two different groups of 
authors 15, 26J have used a graphical framework called a Causal Bayesian 
Network (CBN). Their work is based on Bayesian Networks (BN) which 
is a compact framework for representing certain collections of conditional 
independence statements. 

Algebraic geometry and computational commutative algebra have been| 
successfully employed to address identifiability issues [HI EH H3] and to un- 
derstand the properties of the learning mechanisms [T9J, [20j [21] behind 
BN's. A key point was the understanding that collections of conditional 
independence relations on discrete random variables expressed in a suitable 
parametrization are polynomials and have a close link with toric varieties 
[HI [TT] . Further related work showed that pairwise independence and global 
independence are expressed through toric ideals [S] and that Gaussian BN's 
are related to classical constructions in algebraic geometry e.g. [27] . 

In this paper we observe that when model representations and causal 
hypotheses are expressed as a set of maps from one semi- algebraic space to 
another, then ideas of causality are separated from the classes of graphical 
models. This allows us to generalise straightforwardly concepts of graphical 
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causality as denned in e.g. [TSJ Definition 3.2.1] to non-graphical model 
classes. Many classes of models including context specific BN's [T^l HH 
|2"2"] , Bayes Linear Constraint models (BLC's) [H] and Chain Event Graphs 
(CEG's) [551 [5TJ [551 [5H] are special cases of this algebraic formulation. 

Causal hypotheses are most naturally expressed in terms of two types 
of hypotheses. The first type concerns when and how circumstances might 
unfold. This provides us with a hypothesized partial order which can be re- 
flected by the parametrization of the joint probability mass function of the 
idle system. The second type of hypotheses concerns structural assertions 
about the uncontrolled system that, we assume, also apply in the con- 
trolled system. These are usually expressible as semi- algebraic constraints 
in the given parametrization. Under these two types of hypotheses the 
mass function of the manipulated system is defined as a projection of the 
mass function of the uncontrolled system, in total analogy to CBN's. The 
combination of the partial order and of these constraint equations and in- 
equalities enables the use of various useful algebraic methodologies for the 
investigation of the properties of large classes of discrete inferential models 
and of their causal extensions. 

The main observation of the paper is that a (discrete) causal model 
can be redefined directly and very flexibly using an algebraic representa- 
tion starting from a finite set of unfolding events and a description of a way 
they succeed one another. This is shown through model classes of increas- 
ing generality. First in Section [5] we review the popular class of discrete BN 
models, our simplest class, their related factorization formulae under a pre- 
ferred parametrisation, and their causal extensions. Then we extrapolate 
the algebraic features of BN and give their formalisation in Section [3] in a 
rather general context. In Section |4] we show how this formalization can 
apply to more general classes of models than BN's, so that identifiability 
and feasibility issues can be addressed. Here we describe causal models 
based on trees in Section |4~T1 and the most general model class we consider 
is in Section l4~2l 

The issues are illustrated throughout by a typical albeit simple model 
for the study of the causal effects of violence of men who might watch a 
violent movie, introduced in Section [2. 1.1 1 to outline some limitations of the 
framework of the BN for examining causal hypotheses, which, we believe, 
currently is the best framework to represent causal hypotheses. In Section 
14.31 we are able to express these limitations within an algebraic setting. 

2. Notes on causal Bayesian networks. 

2.1. The BN and its natural parametrization. The discrete BN 
is a powerful framework to describe hypotheses an observer might make 
about a particular system. It consists of a directed acyclic graph with n 
nodes and of a set of probabilistic statements. It implicitly assumes that 
the features of main interest in a statistical model can be expressed in the 
following terms. 
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• The observer's beliefs as expressed through the graph concern 
statements about relationships between a prescribed set of mea- 
surements X — {Xi, X 2 , ■ ■ ■ X n } taking values {x\, x 2 , ■ ■ ■ x n } in 
a product space Sx = Xi x X2 x . . . x X„ , where Xi is a random 
variable that takes values in Xj, 1 < i < n. For 1 < i < n let r, 
be the cardinality of X^, be finite and Xi take value on the set 
of integers 1,2,...,^, henceforth indicated as [rj. Then the joint 
sample space Sx contains r = Y\2=i r i distinct points. 

• The sets of relationships most easily read out of the graph, are 
consistent with a partial order -< on X%, X2, ■ ■ ■ X n implied by the 
graph itself. Historically this order was often chosen so that if 
1 < i\ < %2 < n then Xi t -< Xi 2 in some rather loose mechanistic 
sense, although this is certainly not a necessary interpretation of 
the order. In this case we will call the BN regular. Henceforth we 
will assume a regular BN. 

• The graph expresses the n— 1 conditional independence statements 

X i L±{X 1 ,X 2 , . . .,*(_!> \ Pa(Xi)\Pa(Xi) 

where Pa(Xi) is called the parents of Xi. For a definition see 
For 1 < i < n, in some sense the values the random vari- 
ables in Pa(Xi) take, embody all relevant probabilistic informa- 
tion concerning X t . Furthermore for regular BN's Pa(X{) can be 
interpreted as the set of variables in X relevant to the potential 
development of Xi. 
The last property enables the entire set of beliefs to be expressed by 
a single directed acyclic graph called a BN. Its vertex set is the set of 
measurement variables {X%, X2, ■ ■ ■ , X n } and there is an edges from Xj to 
Xi if and only if Xj £ Pa(Xi). The implicit partial order induced by this 
direct graph and its loose link to the order of how circumstances unfold, 
has encouraged various authors to extend the model to one that also makes 
statements about relationships between the same set of measurements when 
they have been subjected to various controls, e.g. p~5l f26] . Before discussing 
this point, we consider an example to underline some specific features. 

2.1.1. A violent example. Consider a statistical model built to 
study whether watching a violent movie might induce a man into a fight, 
allowing for testosterone levels to, at least partially, explain a violent be- 
haviour. Let X2 denote whether a man watches a violent movie early one 
evening {X2 = 1} or not {X2 = 2} and let X4 be an indicator of whether 
he is arrested for fighting {x^ = 1} or not {x^ = 2} late that evening. If he 
watches the movie, let X\ denote his testosterone level just before seeing 
it and X3 his testosterone level late that evening. For a man who does not 
watch the movie let X\ = X3 denote his testosterone level that evening. 

Assume X\ and X3 take three values: 1 for low levels of testosterone, 
2 for medium levels and 3 for high levels, so that (ri, j"2, r^,r^) = (3, 2, 3, 2) 
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and r — 36. Then this can be depicted as the following BN 

Xi — > A 3 

/ I 
X2 — > X4 

The graph of this BN embodies two substantive statements. The first one, 
X 2 \AX\ is associated with the missing edge from X\ to X 2 and states that 
whether the man watched the movie would not depend on his testosterone 
level. The second one X4LAX1KX2, X3) is associated with the missing edge 
from Xi to X4 and states that the testosterone level before watching the 
movie gives no additional relevant information about the man's inclination 
to violence provided that we happen to know both whether he watched the 
movie and his current testosterone levels. It will be useful later to note 
that the edge (X 2 ,X 3 ) indicates that watching a violent movie might help 
cause the fight by increasing testosterone levels, while the edge {X.2,X±) 
indicates that it might do so by some other mechanism. 

An alternative semi-algebraic representation of this statistical model is 
given as follows. For each of the r = 36 levels x = (xi, x 2 , x 3 , £4) € Sx let 
p(x) = Prob(A"i = xi, . . . , X4 = X4) be the joint mass function associated 
with the BN. For the sake of simplicity we assume p(x) strictly positive 
for each x. An obvious inequality constraint is given by the fact that the 
vector (p(x) : x e Sx) lies in the standard simplex 

A r _x = {u e R r : Y!i=\ u i = 1 and u t > for i = !)■■ •>?"}• (2-1) 

The BN suggests the partial order on the variables for which X\ and 
X 2 precede A 3 which precedes X4. A natural, not unique, parametriza- 
tion is, then, determined by the total ordered sequence X\, X 2 , X3, X4 and 
has 63 parameters: 7Ti(xi) = Prob(Ai = x\), k 2 {x 2 \xi) — Prob(A" 2 = 
x 2 \Xi = x\), 7r 3 (x 3 |xi, x 2 ) = Prob(A 3 = x 3 |Ai = X\,X 2 = x 2 ) and 
7r 4 (x 4 |xi, x 2l x 3 ) = Prob(A 4 = x±\Xi = xi,X 2 = x 2 ,X 3 = x 3 ). Call 
the indeterminates 7Ti(xi), tt 2 (x 2 \xi), 7r 3 (x 3 |xi, x 2 ), Tr^x^xi, x 2 , x 3 ) prim- 
itive probabilities, for (x\, x 2 , X3, X4) E Sx- Sum-to-onc constraint like 
J2 X2 =i 2 7r 2(a^2 l^i) = 1 gives 28 linear constraints to be coupled with the 
positivity assumption. 

A general joint mass function on (X\, X 2 , X3, X4) is given by the 36 
quartic equations 

p(x) = 7ri(xi)7r 2 (x2|xi)7r 3 (x 3 |xi,.T2)7r 4 (x4|xi,X2,x 3 ). (2.2) 

This is a particular form of the general factorisation of the joint mass 
function with respect to a BN 

n 

Prob(A = x) = l[n(x i \Pa(X i ) = pa(x t )) (2.3) 
»=i 
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where x G Sx, ■K{xi\Pa{X i ) = pa(xi)) = Prob(A" = Xi\Pa(Xj) = pa(xi)) 
and pa(xi) is the value taken by the random vector Pa(Xi) when X = x. 

The conditional independence statements in the BN are given by a 
finite set of linear equations in primitive probabilities 

tt 2 (x 2 \xi) = TT2(x2\x' 1 ) = tt 2 (x2) (say) (2.4) 
Tr 4 (x 4 \xi,x 2 ,x 3 ) = ir i (x4\x' 1 ,X2 1 x 3 ) = 7r 4 (x 4 |x2, x 3 ) (say) 

for all xi,x[ — 1,2,3. See [5] for a proof and a discussion of this. The 
statistical model expressed by the BN is then given as a semi- algebraic set 
defined by polynomial equations and inequalities in the primitive probabil- 
ities. 

Furthermore the simple substitution of Equations (|2.4|) into (|2.2|) al- 
lows us to reduces the number of parameters and of constraints. Indeed 
the resulting vectors (7Ti(l), 7Ti(2), 7Ti(3)) lie in A 2 as do each of the vectors 
(tt3(1|^i, X2), 7r 3 (2|a;i, a; 2 ), 7r 3 (3|xi, x 2 j) for xi = 1, 2, 3 and X2 = 1, 2 whilst 
the vectors (7r 2 (l), 7r 2 (2)) and each of the vectors (7T4(l|a;2, £3), iT4(2\x 2 , x 3 )) 
for X2 = 1,2 and x 3 = 1, 2, 3 lies in Ai. Each of the 14 simplices also em- 
bodies a linear constraint through its sum-to-one condition making the 
interior of the domain a 21 dimensional linear manifold. 

A critical point to notice for the generalisations that follow is that each 
of the 14 simplices (iTi(xi\Pa(Xi)) : Xi £ Xj) is labelled by a particular 
configuration of Pa(Xi). In a BN each such configuration of parents labels 
and distinguishes a possible history of circumstances and might influence 
the probabilistic development of the network. 

It is common for a statistical model to contain as its substantive hy- 
potheses more than the conditional independence statements, expressible 
in a BN. Often such additional non-graphical hypotheses can be expressed 
as a set of algebraic equations or inequalities on the primitive probabilities. 
We list a few such additional hypothesis for our example. 

• If the movie is not watched then we would expect X 3 — X\ | [X 2 — 
2), equivalently 

773(^1^1,^=2) = I I otWise. (2 ' 5) 

• If a unit did watch the movie, we would not expect this to reduce 
his testosterone level. This sets some of the primitive probabilities 
to zero, namely 



X 3 \Xi = 


xi,X 2 = 1 


x 3 = 1 


x 3 = 2 


x 3 = 3 


X\ 


= 1 


7T 3 (1|1,1) 


7T 3 (2|1, 1) 


7r 3 (3|l, 1) 


X\ 


= 2 





7T 3 (2|2, 1) 


7T 3 (3|2,1) 


X\ 


= 3 








1 



(2.6) 
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• The assumption that the higher the prior testosterone levels the 
higher the posterior ones, is given by 



where < r^z, r 3i 3 < 1 are additional semi parametric parameters. 

• Similarly it is reasonable to expect that higher levels of testos- 
terone together with having seen the movie would make more prob- 
able that a man would be arrested for fighting. This can be ex- 
pressed as 7T4(1|1, £3) = r4 jX3 7T4(l|2, X3) for x 3 — 1,2,3 and for 
x 3 = 1,2 n 4 {l\l,X3 + 1) = r 4 2 , 3 7r 3 (l|l,a;3) and 7r 4 (l|2, x 3 + 1) = 
r'{ X3 ir 3 (l\2,x 3 ) where < r^^, r 4 ^ , r 4 ' ^ < 1, similarly to the 
previous bullet point. 

• Finally a common simple log-linear response model might assume 

F4,l = r4,2 = ^4,3- 

The point here is not that these supplementary equations and inequalities 
provide the most compelling model, but rather that embellishments of this 
type, whilst not graphical, are common, are easily expressed in the prim- 
itive probability parametrization, and often have an almost identical type 
of algebraic description as the BN. 

In general then, a BN is a collection of monomials in primitive prob- 
abilities and the p(x) parameters. It is defined through a total order of 
variables — in the example Equations (|2.2[) — supplemented by the set of 
linear equations on the primitive probabilities 



whenever (xx, %2, ■ ■ ■ a^-i) and (a^, x' 2 , ■ ■ ■ take the same value on 

Pa(Xi), 1 < i < n. In the example these are Equations (|2.4| . More 
detailed types of model specification are given by the saturated model, 
e.g. Equations ()2.2|) . supplemented by further algebraic and semi-algebraic 
equations analogous to Equations (|2.4p and to those in the bullet points 
above. So a strong case can be made for starting with this class of algebraic 
description and relegating the graphical formulation as a useful depiction 
of a particular subclass of these structures. 

The BN has other associated factorization formulae based on its clique 
structure, see e.g. [3], that are more symmetric and have been used as a 
vehicle for a different algebraic formulation, see e.g. [51 [5]. In fact it is often 
elegant to express this discrete model in terms of its natural exponential 
parametrization [6]. However for causal models the partial order on the 
XiS given by the topology of the BN — and hence the associated factoriza- 
tion of the joint mass function — is critical to the definition of the predicted 
effect of manipulating the system: see below. In causal modelling we have 
therefore found it to be more expedient to parametrize a model directly 
through conditional probabilities chosen so they are consistent with such 



7rs(l|2, 1) = ra,27r 3 (l|l, 1) 
7r 3 (3|2, 1) = r 3 ,37r 3 (3|l,l) 



(2.7) 



TT i (x i \x 1 ,X 2 , ■ ■ .Xi-i) 



TTi(Xi\x 1 , X 2 , ■ ■ ■ x i-l) 
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a causal partial order. Under the parametrization given by these primitive 
probabilities, a BN can be thought of as a labelling of a collection of sim- 
plices about what might happen (the value a node random variable might 
take), given the relevant past (the particular configuration of values taken 
by its parents). 

2.2. Manifest and hidden variables. Typically it is required to 
infer the value of a vector f(p(x) : x G X). If we are interested in the 
whole joint mass function, / is the identity. Often / is a polynomial or a 
rational polynomial function in the primitive probabilities. Obviously such 
inference would be trivial if we could learn the full probability table p(x) : 
a;6l However usually only variables in a subset M of {X%, X 2 , ■ ■ . , X n } 
are measured in a particular population, sometimes over a very large sample 
of individuals. The random variables in M are called manifest and those in 
H = {Xi,X 2: . . . , X n }\M are called hidden. Almost always we can learn 
only the values of the polynomials 



where x(M) is a sub- vector of x involving only values of the manifest ran- 
dom variables. For the example in Section 12.1.11 it may be impossible to 
determine the testosterone levels H = {X\} of the individuals in any sam- 
ple, but only M — {X 2 , X3, X4}. If we ignore the positivity conditions, this 
is a Newtonian problem in albeit real algebraic geometry and so solvable 
through techniques like elimination theory. Indeed when / is the identity 
these identifiability questions are now answered for many small BN's by 
using elimination techniques. See e.g. [8] and |14j for examples from the 
field of computational biology. 

Often the study of identifiability issues after observing the manifest 
margins (|2.8|) has been driven more by the semantics of the graph of a BN 
where a full node of the graph represents a hidden uariaWe/measurement. 
However in practice missingness of data is often contingent on what has 
happened to a unit, i.e. the particular value its parent configuration takes 
and not the whole variable. 

To illustrate this point consider collecting data for the example in Sec- 
tion [2TTTT] when X4 is hidden and it is the variable of central interest with its 
associated probabilities tt^x^xi, x 2 , £3). It might be possible to randomly 
sample men and measure their testosterone levels before and after watch- 
ing a violent movie. Call this Experiment 1. However if it were seriously 
believed that watching a violent movie might induce a fight, it would be un- 
ethical to release the subjects after watching the movie, while any therapy 
either in the form of drugs or counselling will corrupt the experiment. In 
any case recording the proportions of subjects who later fought would not 
give an appropriate estimate of probabilities associated with X4 and con- 
ditional on its parents. So values like 7T4(2|a;i, 1,2:3) cannot be estimated 




(2.8) 
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from such samples. To identify the system we therefore need to supplement 
this type of experiment with another measuring willingness to fight. Other 
experiments might be envisaged leading to analogues problems. 

Partial information about the joint distribution of A4 with other vari- 
ables might be obtained from a random sample of men arrested for fighting 
{x4 = 1}. Their current testosterone levels X 3 and whether they had 
recently watched a violent movie X2 could be measured. But we could 
not measure (X2,X 3 ) for men that are not caught fighting. Thus the 
finest partition of probabilities we could hope for in a population under 
this kind of survey is based on the sample space partition {A, A(x2, £3} : 
x 2 = 1,2,2:3 = 1,2,3} where A = {x : A 4 = 2} and A(x2,x 3 ) = {x : 
X 2 = x 2 ,X 3 = x 3 ,X 4 = 1} i.e. q(A) = Y, Xi gaP( x ) and for x 2 = 1,2 and 
x 3 = 1,2,3, q(A(x 2 ,x 3 )) = J2 x ,eA(x 2 ,x 3 )P( x )- Cal1 tnis Experiment 2. 

The algebraic expression of the observations from this second experi- 
ment are analogous to Equations (|2.8p , being sums of the probabilities on 
the atoms of the joint mass function, but they are not of the same form be- 
cause manifest equations do not correspond to marginal constraints. Nev- 
ertheless the types of elimination techniques applicable to BN can clearly 
still be employed to determine the geometry and properties of its solution 
spaces. So the pattern of missing data encountered often have an algebraic 
but not a graphical representation. 

2.3. Causal functions. As already mentioned, the regular BN in 
Section 12.1.11 could be hypothesised to be causal following many authors 
e.g. [121 155]. Here the term "cause" has a very specific meaning and the 
causal structure is conventionally associated to the partial order of the 
graph in a regular BN. A formal definition is given in the next section. See 
also [El Equation (3.10)]. First we discuss some key points. 

Asserting that the BN in Section 12.1.11 is a CBN implies that since 
Xi -< X 3 and X2 ~< X 3 we believe X% and X 2 are potential causes of X 3 . 
This means that if the prior level of testosterone X\ were to be controlled 
to take the value x\ and the man were made to watch the film (or not 
to), then the probability he had a testosterone value X 3 = x 3 would be 
the same as the proportion of times X 3 — x 3 was observed to occur in 
the uncontrolled (infinite) population with observed values X\ — x\ and 
X2 = 1 {X2 = 2 if he was forced not to watch the movie) . 

Similarly, a causal interpretation of this BN would also assert that the 
effect on the probability the man would fight {X4 = 1} if we forced {Xi = 
Xi : i = 1,2,3} would be identified with tt4(1\xi,X2,x 3 ) = tt4(1\x2, x 3 ), i.e. 
the corresponding conditional probability in the uncontrolled system. 

Furthermore, forcing a variable to take a value Xi = Xi will have no 
effect on the joint distribution of the variables which do not follow Xi in 
the causal partial order. For example increasing the testosterone level X 3 
would have no effect on the joint probability of (Xi,X 2 ). 

Obviously a CBN makes stronger statements than a BN with the same 
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graph. As in the example above the extra modelling statements made in 
a CBN are often plausible and gives us a framework within which to make 
predictions about the observed system were it to be subject to certain 
controls. For example we might want to consider the potential effect of 

1. banning the film, thus preventing it from being viewed by the gen- 
eral public (force A2 = 2) or 

2. imposing a treatment on the population for reducing testosterone 
levels so that they are always low (force X\ — A3 — I), e.g. in an 
enclosed population like a prison. 

It is easily checked that the predicted potential effect of cither of these 
controls under the CBN hypothesis is a plausible one. Even when the 
idle system is only partially observed, the CBN hypotheses can enable us 
to estimate the probable effects of such controls simply from observing 
a random sample of men not subject either to a ban or a testosterone 
inhibiting treatment. 

The use of the CBN to express causal hypotheses has been successfully 
employed in many scenarios e.g. p~5j [26], while in others it is restrictive 
and implausible, as poignantly discussed in |24j . The main problem is that 
causal orders are more naturally defined as refinements of a partial order 
on circumstances — in a BN represented by particular configurations of 
parents — than on sets of measurements. Again we will use the example in 
Section [2. f. II to demonstrate this. For fuller examples see [U ESI [29]. We 
will omit any discussion of the important issue of exactly how we intend to 
enact the control of a measurement to a particular value. 

In the example the partial order on the nodes of the BN is Ai , X2 -< A3 
and Ai, A2, A3 -< A4. But note that, in our statement of the problem, if the 
man watches the movie then by definition Ai = A3. Under this definition, 
manipulating A3 and leaving Ai unaffected, as would be required by the 
CBN, is not possible. If we follow the two different types of unfoldings 
of history: {prior testosterone level Ai = 1,2,3, watch movie, A2 = I 
posterior testosterone level A 3 = 1,2,3, arrested A 4 = 1,2} and {prior 
testosterone level X\ — 1,2,3, don't watch movie, A 2 = 2, arrested A 4 
1 . 2} this sort of ambiguity disappears and we could reasonably conjecture 
that these unfoldings are consistent with their "causal order" . This might 
be expressed by the two context specific graphs below 

Ai — > A3 Ai 

/ I \ 
X-2 = I — ► A4 A2 = 2 — » A4 

The joint mass function is no longer defined on the product space Sx 
with A = {Ai, A2, A3, A4}. However the joint mass function of each of 
these possible unfoldings is well defined and furthermore each unfolding is 
expressible as a monomial in the primitive probabilities. Note that the class 
of monomials for the right-hand graph is of order one less than the left- 
hand one. Many other common problems exist for which the CBN cannot 
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express a hypothesized causal mechanism whilst algebraic representations 
allows this [201 121] . 

3. Conditioning and manipulating. 

3.1. Multiplication rule. We start by fixing some notation and re- 
viewing some known results. For a positive integer d let A^-i = {u G R d : 
Si=i u i = 1 and "i ^ for i = 1) • • • j d} be the (d — l)-standard simplex 
and Cd = {u £ M. d : < u, < 1 for i = 1, . . . , d} the unit hypercube in K d . 
For a set A C R d , let A° be its interior set in the Euclidean topology. 

The set of all joint probability distributions on the n-dimensional ran- 
dom vector X = {Xi, . . . , X n } taking the r values in Sx, defined in Section 
12.11 is identified with the A r _i simplex simply by listing the probabilities 
of each value taken by the random vector 

(p(x) : x = (xi, ...,x n )e S x ) € A r _i 

where p(x) — Prob(X = x) = Prob(Xi = x\ 1 . . . , X n — x n ). 

In [8] it is shown that independence of the random variables in X 
corresponds to the requirement that p(x) belongs to a Segre variety in A r _i 
and that the naive Bayes model corresponds to the higher secant varieties of 
Segre varieties. While local and global independence in a BN are studied in 
[0J. The most basic example, here, is that two binary random variables are 
independent if p(0, 0)p(l, 1) — p(l,0),p(0, 1) = 0, the well known condition 
of zero determinant of the contingency table for Xi and X 2 ■ 

There are various ways to map a simplex into a smaller dimensional 
simplex. Some are relevant to statistics. Sturmfels (John Van Neumann 
Lectures 2003) observes that, for Jc [n], marginalisation over Xj and Xja 
gives a linear map which is a projection of convex polytopes. Namely, 

m : A x — ► A Xj x A Xjc 

(p(x):xeSx) 1 — > ipj{x) ■ x e S Xj ,Pje{x) : x € S Xj c) 

(3.1) 

where pj{x) = S^eW.ieJ P( Xl > • ■ • and analogously for pjc(x). 

Here we compare the two operations of conditioning and manipulation. 
Diagram (|3.2p summarises this section for binary random variables 

I (3.2) 

Once the order X\ -< . . . ~< X n is assumed on the element of a random 
vector X on Sx — Ti7=i ^ ana * = x ) f° r au x G Sx, w e can 
write 

p{x) = 7ri(xi)7r 2 (a;2|a;i) . . . ir n (x n \xi, . . . ,x n -i) (3.3) 
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where tti(xi) — Prob(Xi = Xi) and Wi(xi\xi, . . . , Xj_i) = Prob(A%; = 
Xi\Xi = xi, . . . ,Xi_i — Xj_i) for i = 2, . . . , n. Note that 

(tt 1 (x 1 ) : x 1 £ 
(^(a^i) : (x 1 ,x 2 ) £ S( Xl ,x 2 ) 



) G S x ) 

Hence the multiplication rule is a polynomial mapping 

M : A ri _ 1 xA^_ 1 ...Ang 1 " — Ar^^.! (3-4) 

where the domain is parametrised by the primitive probabilities and the im- 
age space by the joint mass probabilities. For two binary random variables 
let 

si = Prob(Xi = 0) 

s 2 = Prob(X 2 = 0|Xi = 0) 

s 3 = Prob(X 2 = 0|Xi = 1) 

then Ai x A^ is isomorphic to C 3 and 

H'. C 3 — > A 3 

(Sl, S2,S3) 1 > (siS 2) Sl(l - S 2 ), (1 - 81)83, (1 - S X )(1 - S3)) 

The coordinates of the image vector are listed according to a typical order 
in experimental design given by taking points from top to bottom when 
listed like those in Table [1] for n = 3 and for binary random variables. 

X\ x 2 x 3 



1 

1 

1 1 

1 
1 1 
1 1 
111 

Table 1 

Top to bottom listings of sample points 

We note that the map (|3.4p is not invertible on the boundary but it 
is invertible — through the familiar equations for conditional probability — 



e A ri _i 

€ A r2 _! X ... X A r2 _! 
^ ■* 

T\ times 

e A n "=t n . 

r n — 1 
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within the interior of the simplex where division can be defined. For prob- 
lems associated with the single unmanipulated system this is not critical 
since such boundary events will occur only with probability zero. However 
when manipulations are considered it is legitimate to consider what might 
happen if we force the system so that events that would not happen in 
the unmanipulated system were made to happen in the manipulated sys- 
tem. It follows that from the causal modelling point of view the conditional 
parametrisation is more desirable. 

3.2. Conditioning as a projection. Consider i G [n] and define 
x-i = (xi,... ,Xi-i,x i+ i, ...,x n ) and [r_j] = Xi x ... x X;_i x X i+ i x 
. . . x X„. Analogous symbols are defined for J C [n]. For x* E [r-j] such 
that Prob(A"i = x*) ^ 0, the conditional probability of X on {Xi = x*} is 
defined as 



if Xi 7^ x* 

if Xi = x* 



Prob(X = x\X, = x*) = { Pjx) 

Outside the set Xi ^ x* , this mapping is an example of the simplicial pro- 
jection on the face Xi = x*. Briefly, any simplex A in the Euclidean space 
is the join of any two complementary faces, which are simplices themselves. 
In particular, if F and F c are complementary faces, then each point P in 
the simplex and not in F or F° lies on the segment joining some point Pp 
in F and some point Ppa in F c , and on only one such segment. This allows 
us to define a projection ttf ■ A \ F c — > F, by ttf(P) = Pf if P £ F and 
ir F {P) = P if P £ F. 

Example 1. For n = 2 and P = (p(0, 0),p(0, l),p(l, 0),p(l, 1)) with 
p(0,0)+p(0,l) ± 0, F = {x e A 3 : x = (x u x 2 , 0, 0)} and F c = {x e A 3 : 
x = (0, 0, x 3 , X4)}, we have 

-(p(0,0),p(0,l),0,0) 



p(0,0)+p(0,l)' 

P - = p(i^)TMM) (0 ' ' p(1 ' 0) ' p(M)) 

P = (p(0,0)+p(0,l))P F + (p(l,0)+p(l,l))P F c 

For X and Y binary random variables, the operation P(Y\X = 0) corre- 
sponds to 

A° — ► Al 

(p(o,o),p(o,i),p(i,o),p(i,i)) — > __i___(p(o,o),p(o,i)) 

It can be extended to the boundary Ai giving for example the probabilities 
mass functions for which p(0, 0) = or 1. 

By repeated projections we can condition on Prob(Xj = x*j) > with 
J C [n]. Then, the operation of conditioning returns a ratio of polynomial 
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forms of the type x/(x + y + z) where x, y, z stand for joint mass function 
values. This has been implemented in computer algebra softwares by vari- 
ous researchers, as an application of elimination theory. A basic algorithm 
considers indeterminates t x with x G Sx for the domain space and b y with 
y G [r-j] for the image space. The joint probability mass (p(x) : x G Sx) 
corresponds to / = Ideal(i x — p(x) : x G Sx) of Q[t x : x G Sx], the set of 
polynomials in the t x with rational coefficients. Its projection onto the face 
Fj can be computed by elimination as follows by adjoining a dummy in- 
determinate / and viewing / as an ideal in M.[t x : x G Sx, b y : y G I]- 
Consider I + J where J is the ideal generated by 

1 - Ej/e^j] b v / 3 ^ 

b v l-p{x)Y, ye[r _ j] by 

where x and y are suitably matched by the definition of conditioning. Then 
the elimination ideal of / + J of the I and t x variables corresponds to the 
simplicial projection. 

Example 2. We use the freely available software CoCbA[2] to project 
the point P = (1/3, 1/3, 1/3) G A 2 onto the face x\ + x 2 = 1. The ideal of 
the point P in the t[l], i[2], t[3] indeterminates is I = Idcal(t[l] — 1/3, t[2] — 
1/3, t[3] — 1/3). J describes a plane parallel to the face a; [3] = of the 
simplex and J is the ideal in Equation (|3.5|) . Lex and GBasis are the 
technical commands to perform the elimination. The result is in the last 
line. 

Use T : : =Q [t [1 . . 3] Is [1 . . 2] ] , Lex ; 
I:=Ideal(t [l]-l/3,t [2]-l/3,t [3] -1/3) ; 
L:=t[l]+t [2]-l; 

J:=Ideal(s[l] 1-1/3, s [2] 1-1/3, s [1] +s [2] -1 ,L, s [1] +s [2] -1) ; 
GBasis(I+J) ; 

[t[3] - 1/3, t[2] - 1/3, t[l] - 1/3, 
s[l] + s[2] - 1, -1 + 2/3, 2/3s[2] - 1/3] 

3.3. The manipulation of a Bayesian network. In Equation (3.10)| 
of [15] J. Pearl, starting from a joint probability mass function on X 7 an 
x* value and assuming a causal order for a BN, defines a new probabil- 
ity mass function for the intervention Xi — x*. In general, we partition 
[n] = {i} U {1, . . . , i — 1} U {i + 1, . . ., n} and assume this partition compat- 
ible with a causal order on X, that is if j G {1, 1} then Xj is not 
affected by the intervention on Xi. If the probabilistic structure on X is a 
BN then we consider a regular BN. We consider the parametrization 

p(x) =p{xi, . . .,Xi-i)p(Xi\xi, . . . ,Xi^x)<p(x i+ i, . . .,X n \xi, ...,Xi) 

for which a probability is seen as a point in 
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The intervention or manipulation operation is defined only for image points 
for which x^ ^ x* and returns a point in 



namely the point with coordinates 



for (x\i ■ ■ ■ ,Xi-x) £ Xi X . . . X and . . . ,x n ) G X. i+1 x . . . x X„. 

Note that this map is naturally defined over the boundary. In contrast 
there is no unique map extendible to the boundary of the probability space 
in A x - 

For binary random variables it is the orthogonal projection from Ca»— i 
onto the face Xi ^ x* which is identified with the hypercube C 2 «-i_i. In 
general, for a regular BN this is an orthogonal projection in the associated 
conditional parametrisation, which then seems the best parametrization in 
which to perform computations. The post manipulation joint mass function 
on X\Xi is thenp(xi, . . . , Xi-i)p(xi+i, ... , x n \xi, . . . , a;*) which, factorised 
in primitive probabilities, gives a monomial of degree n — 1, one less than 
in Equation (|2.3[) . In this sense, under the conditional parametrization, 
the effect of a manipulation or control gives a much simpler algebraic map 
than the effect of conditioning. 

Its formal definition depends only on the causal order, the second bullet 
point in Section |2~TI and not on the probabilistic structured of the BN. In 
particular it does not depend on the homogeneity of the factorization of 
the joint mass function on X across all settings. This observation allowed 
us to extend this notion to larger classes of discrete causal models. See 
[201 E] and Section [2 

Identification problems associated with the estimation of some prob- 
abilities after manipulation from passive observations (manifest variables 
measured in the idle system) have been formulated as an elimination prob- 
lem in computational commutative algebra. For example in the case of BN 
the case study in [10] , giving a graphical application of the back-door the- 
orem [15], has been replicated algebraically by Matthias Drton using the 
parametrization in primitive probabilities. Ignacio Ojeda addresses from 
an algebraic view point a different and more unusual identification prob- 
lem in a causal BN with four nodes. He uses the p{x) parameters and the 
description of the BN as a toric ideal. Both are personal communications 
at the workshop to which this volume is dedicated. 

In general, a systematic implementation of these problems in computer 
algebra softwares will be slow to run. At times some pre-processing can 
be performed in order to exploit the symmetries and invariances to various 
group action for certain classes of statistical models [T3]- Other times a 
re-parametrisation in terms of non-central moments loses an order of mag- 
nitude effect on the speed of computation |23j and hence can be useful. 




p(xx,..., 



Xi-i) and p(x i+1 , . . . , 
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Nevertheless in this algebraic framework many non-graphically based sym- 
metries which appear in common models are much easier to exploit than 
in a graphical setting. This suggests that the algebraic representation of 
causality is a promising way of computing the identifiability of a causal 
effect in much wider classes of models than BN. 

4. Reformulating causality algebraically. To recap: 

1. a total order on X = {Xi, . . . , X n } and an associated multiplica- 
tion rule as in Equation (|3 . 3|) are fundamental. These determine a 
set of primitive probabilities; 

2. a discrete BN can be described through a set of linear equations 
equating primitive probabilities, Equations (|2.4p . together with 
inequalities to express non negativity of probabilities and linear 
equations for the sum-to-one constraints; 

3. a BN is based on the assumption that the factorization in Equation 
(|2.3p holds across all values of x in a cross product sample space. 
Recall that in [23] it is shown that identification depends on the 
sample space structure, in particular on the number of levels a 
variable takes; 

4. within a graphical framework subsets of whole variables in X are 
considered manifest or hidden; 

5. mainly the causal controls being studied in e.g. [HI [26] correspond 
to setting subsets of variables in X to take particular values and 
often the effect of a cause is expressed as a polynomial function 
of the primitive probabilities, in particular the probability of a 
suitable marginal; 

6. identification problems formulated in the graphical framework of 
a BN and intended as the writing of an effect of a cause in terms 
of manifest variables are basically elimination problems. Hence 
they can be addressed using elimination theory from computa- 
tional commutative algebra. In particular theorems like the front- 
door theorem and the back-door theorem are proved using clever 
algebraic eliminations, see |15j . 

The above scheme can be modified in many directions to include non- 
graphical models and causal functions not expressible in a graphical frame- 
work, like those in Section 12.31 Identification problems can still be ad- 
dressed with algebraic methods as in Item[6]above. An indispensable point 
for a causal interpretation of a model is a partial order either on X or on 
Sx , where the sample space may be generalised to be not of product form. 

A first generalisation is in [19] where the authors substitute the bino- 
mials in Item [2] above with linear equations and the inequalities in Item [1] 
with inequalities between linear functions in the primitive probabilities. If 
there exists at least a probability distribution over X satisfying this set 
of equations and inequalities then the model is called a feasible Bayesian 
linear constraint model. 
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Of course a mere algebraic representation of a model will lose the 
expressiveness and interpretability associated with the compact topology 
of most graphical structures and hence to dispense completely with the 
graphical constraints might not always be advisable. But a combined use 
of a graphical representation and an algebraic one will certainly allow the 
formulation of more general model classes and will allow causality to benefit 
of computational and interpretative techniques of algebraic geometry as 
currently happens in computational biology [14] . A causal model structure 
based on a single rooted tree and amenable of an algebraic formulation is 
studied in [20l [21] . In there, following [24] the focus of the causal model is 
shifted from the factors in X to the actual circumstances. Each node of 
the tree represents a "situation" — in the case of a BN a possible setting 
of the X vector — and the partial order intrinsic to the tree is consistent 
with the order in which we believe things can happen. This approach 
has many advantages, freeing us from the sorts of ambiguity discussed in 
Section 12.1.11 and allowing us to define simple causal controls that enact a 
particular policy only when conditions might require that control. 

4.1. Causality based on trees. Assume a single rooted tree T = 
(V, E) with vertex set V and edge set E. Let e = (v, v') be a generic edge 
from v to v' and associate to e a possibly unknown transition probabilities 
tt(v'\v) 6 [0,1] under the constraint Ylv':(v,v')eE ^(^'l^) = 1; f° r au v EV 
which are not leaf vertices. The set II = {7r(V|t;)} gives a parametrization 
of our model and the n(v'\v) are called primitive probabilities. Let X be the 
set of root-to-leaf paths in T and for A = (e\, . . . ,e n (x)) — ( v o , ■ ■ ■ , v n(\) ) S 
X, where Vq is the root vertex and u n (A) a l ea f vertex, define the polynomials 

n(A)-l 

p(x)= n *Kn i«o- (4- 1 ) 

i=Q 

In [20] it is shown that (X, 2 x ,p(-)) is a probability space. The set of 
circumstances of interest is then represented by the nodes of the tree and 
the probabilistic events are given by the leaves of the tree, equivalently the 
root-to-leaf paths. 

Here are three examples from the literature. Once an order on X 
has been chosen, a BN corresponds to a tree whose root-to-leaf paths have 
all the same length, Sx = X and its independence structure is translated 
into equalities of some primitive probabilities [25, 24]. The basic saturated 
model individuated by the polynomials in Equations (|4.1j) augmented with 
a set of algebraic equations in the elements of II has been called alge- 
braically constraint tree in [3T] . In [501 [23 12H1 12S] a model based on a tree 
and called a chain event graph has now been developed and explored to 
some level of detail. 

There is a natural partial order associated with the tree which can 
be used as a framework to express causality: v -< v' if there exists A e X 
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such that v,v' G A and v lies closer to Vq than v' . A tree is regular if in 
the problem we are modelling the circumstance represented by v occurs 
before the one represented by v' whenever v -< v'. The effects of a control 
on a regular tree T can now be defined in total analogy to Item [5] above 
by modifying the values of some primitive probabilities or more generally 
by defining constraints in the primitive probabilities that have a causal 
interpretation. 

Definition 4.1. Let T = (V, E) be a regular tree and IT the associated 
primitive probabilities. A manipulation of the tree is given by a subset F C 
E and an extra set of parameters associated to edges in F, namely TIf = 
{n(y'\v) : (v,v') G F} under the constraints tt(v'\v) > for all (v,v') G 
F and J2v':(v^')eE\F 7r ( v '\ v ) + T, v '-.(v,v')eF ^i v '\ v ) = 1 - Furthermore the 
tt(v'\v) are assumed to be functions of the primitive probabilities for all 
(v,v') G F. 

For example in the typical manipulations in [151 120j and in Section [3. 31 
some tt(v'\v) are chosen equal to one and hence some others equal to zero. 
Here we observe that Definition 14.11 translates into a map similar to the 
one discussed for BN's in Section 

To simplify notation let S C V be the set of non leaf vertices in T. 
For v G S let X(v) = {v 1 G V : (v,v ! ) G E} and r v be the cardinality 
of X(u). Then the saturated model on a tree is equivalent to the list of 
primitive probabilities n — (ir(v'\v) : (v,v') G E) G lines ^rv-l together 
with the semi-algebraic constraints J2 v '-(v v')ee Tt(v'\v) = 1 and 7r(u'|u) > 
and with the partial order of the tree, equivalently Equations (|4.1|) . 

For F C E let D F = {v G V : there exists v' such that (v, v') G F}. 
We can re-arrange the list 7r to list first primitive probabilities of edges not 
in F and then a manipulation on F is given by the mapping 

Y[ A r x x Y[ A rv _i — ► Y[ A r , x x Y[ A r „_l 
ves\D F veD F ves\D F veo F 

(ir(v'\v) : (v, V ) G E) i — > (ir(v'\v) : (v, v') G E \ F, tt(v'\v) : (w, v') G F) 

For the typical manipulations in |T5l [20] and in Sect ion [3731 this map simpli- 
fies to an orthogonal projection on Il,;es\r> F &r v -i 3 (n(v'\v) : v G S\D). 

4.2. Extreme causality. To effectively discuss causal maps we no- 
tice that we need 1. a finite set of "circumstances" — in the BN repre- 
sented by parent configurations and in the tree by the tree situations — 
augmented with a finite set of "terminal circumstances" , e.g. the possible 
final outcomes of an experiment, and 2. a partial order defined on these 
circumstances expressing the causal hypotheses of the system. The circum- 
stances could be identified with particular types of causally critical events 
in the event space of the uncontrolled system, e.g. X of Section |4~T1 

Hence let V — {v} be the finite set representing circumstances and 
terminal circumstances and -< a partial order on V . The partial order 
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can be visualised through its Hasse diagram and corresponds to a finite 
number of chains of elements of V. A chain is a list of elements in V: 
A = (vx, . . . , v n ) where -< Uj for all i = 2, . . . ,n and such that for no 
v' , v" 6Vwe have v' -< v\ and v n -< v". A circumstance can belong to more 
than one chain and chains can have different lengths, initial circumstances 
and terminal circumstances. A chain represents a possible unfolding of the 
problem we are modelling, from a starting point, vo, to an end point, v n . 
The order represents the way circumstances succeed one another and one 
could be the cause of a subsequent one. 

Once the partial order in V has been elicited, a parametrization of 
a saturated statistical model on V can be defined as a set of transition 
probabilities: tt(v'\v) € [0, 1] where v,v' G V are such that v' and v are in 
the same chain, say A, v -< v' and there is no v* 6 A such that v -< v* -< v'. 
That is, there is a chain to which both v and v 1 belong and v precedes v' 
immediately in the chain. We call n(v'\v) primitive probabilities, collect 
them in a vector n = (ir(v'\v)) and note that they can be given as labels to 
the edges of the Hasse diagram. Moreover, we require that if v belongs to 
more than one chain, then the sum of the transition probabilities tt(-\v) is 
equal to one, i.e. X^'eA-ueA n ( v '\ v ) = 1- This defines the domain space of 7r 
as a product of the simplices in total analogy to the cases of BN's and trees. 
The probability of a chain A is now defined as p{\) = YYi=i in 
analogy to Equations (|4.1|) and (13. 3p . 

Thus, we have determined a saturated model parametrised with 7r 
and given by the sum-to-one constraints and the non-negative conditions. 
A sub-model, say S 1 , can be defined by adjoining equalities and inequalities 
between polynomials or ratios of polynomials in the primitive probabilities, 
say q(ir) = and r(7r) > 0, where q and r are polynomials or ratios of 
polynomials. Of course one must ensure that there is at least one solution 
to the obtained system of equalities and inequalities; that is, that the model 
is feasible. Sub-models can also be defined through a refinement of the 
partial order. 

Next, causality can be defined implicitly by considering a set F of 
edges of the Hasse diagram and for (v, v') 6 F adjoining to S a new set 
of primitive probabilities Tt(v'\v) and some equations tt(v'\v) = ff VtV >)(Tr) 
where ff v y) is a polynomial. Collect the new parameters in the list 77 = 
(tt(u») = /(tt), where / = : (v,v') £ F). 

Identifiability problems are now formulated as in previous sections. 
Suppose we observe some polynomial equalities of the primitive probabili- 
ties, m = m(ir), and even some inequalities m(7r) > 0, where m is a vector 
of polynomials. Then we are interested in checking whether a total cause, 
e = e(7?), is identifiable from and compatible with the given observation. 
This computation could be done by using techniques of algebraic geometry 
in total analogy to BN's and trees as discussed in Item[6l 

The top-down scheme in Table 14.21 summarises all this. In the top 
cell we have a semi-algebraic set-up involving equalities and inequalities in 



ALGEBRAIC CAUSALITY: BAYES NETS AND BEYOND 



19 



Saturated model 


< 


n(v'\v) < 1 and Y,v'eX:vex 7T ( v '\ v ) = 1 


Submodel 


q(n 


) = and r(w) > 


System manipulation 


7? = 




Manifest 


m = 


- m(n) and n(n) > 


Identifiability 


e = 


e(m(7r*)) 



Table 2 
Summary of Section \4-2\ 



the 7r parameters involving polynomials or ratios of polynomials. We must 
ensure that the set of values of tt which solve this system of equalities and 
inequalities is not empty i.e. the model is feasible. In the next two cells we 
add two sets of indeterminates: n and m = (to), and some equalities and 
inequalities of polynomials in the tv. Then the effect e is uniquely identified 
if there is a value 7r* of 7r satisfying the system and e = e(rn(jr*)). 

All the models considered in this paper fall within this framework 
and within the class of algebraic statistical models [BJ. In particular in 
CEG models [55] circumstances are defined as sets of vertices of a tree 
and the partial order is inherited from the tree order. CEG's in a causal 
context have been studied in [21] and they have been applied to the study 
of biological regulation models pQ. We conjecture that there are many 
other classes of causal models that have an algebraic formulation of this 
type and are useful in practical applications. We end this paper by a short 
discussion of how the identifiablity issues associated with the non-graphical 
example of Section 12.1.11 can be addressed algebraically. 

4.3. Identifying a cause in our example. For the example in Sec- 
tion l2.1.11 assume conditions (|2.5p and (|2.6[) . Hence, for x\ — 1,2,3 the non- 
zero probabilities associated with not viewing the movie are p(xi, 2, x\, 1) = 
7Ti(xi)7T2(2)7r4(l|2,a;i) and p{x\, 2, x±, 2) = tti(xi)tt2(2)tt4 : (2\2 1 x\) whilst 
the probabilities associated with viewing it are given in Table 14.31 

Consider the two controls described in the bullets in Section |2~31 The 
first, banning the film, gives non-zero probabilities for x\ = 1,2,3 satis- 
fying the equations p{x\, 2, x\, 1) = 7ri(xi)7T4(l|2, x\) and p(xi, 2, x\, 2) = 
■K\(x\))'Ki{2\2, Xi). The second, the fixing of testosterone levels to low for 
all time, gives manipulated probabilities 

2, 1, 1) = 7r 2 (2)7r 4 (l|2, 1) 2, 1, 2) = 7r 2 (2)7r 4 (2|2, 1) 

p(l, 1, 1, 1) = 7r 2 (l)7T 4 (l|l, 1) p(l, 1, 1, 2) = 7T 2 (l)7r 4 (2|l, 1). 

Now consider three experiments. Experiment 1 of Section l2~2l exposes men 
to the movie, measuring their testosterone levels before and after viewing 
the film. This obviously provides us with estimates of 7Ti(iri), for x\ — 
1, 2, 3 and 7T3(x3|l, x\) 1 < X\ < xz < 3. Under Experiment 2 of Sect ion [2~2l 
a large random large sample is taken over the relevant population providing 
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Probabilities associated with viewing the movie 



good estimates of the probability of the margin of each pair of X 2 and the 
level of testosterone X3 on those who fought, {X4 = 1}, but only the 
probability of not fighting otherwise. So you can estimate the values of 
and sample for x\ — 1, 2, 3 p(xi, 2, xi, 1) = 7Ti(a;i)7r 2 (2)7r4(l|2, xi) and 

p(l, 1, 1, 1) = 7n(l)7r 2 (l)7r 3 (l|l, l)7r 4 (l 1, 1) 
p{l, 1, 2, 1) = 7 r 1 (l)7T 2 (l)7r 3 (2|l, 1)^(1 1, 2) 
p(l, 1, 3, 1) = 7ri(l)7r 2 (l)7r 3 (3|l, 1)^(1 1,3) 
p(2, 1, 2, 1) = 7r 1 (2)7r 2 (l)7r 3 (2|2, 1)^(1 1, 2) 
p(2, 1, 3, 1) = 7ri(2)7r 2 (l)7r 3 (3|2, 1)^(1 1, 3) 
p(3, 1, 3, 1) = 7ri(3)7r a (l)7r 3 (3|3, 1)tt 4 (1 1, 3). 

Note the last probability is redundant since it is one minus the sum of 
those given above. Finally Experiment 3 is a survey that informs us about 
the proportion of people watching the movie on any night, i.e tells us 
(7r a (l),7r 2 (2)). 

Now suppose we are interested in the total cause [TS] 

e = p(xx,2,x 3 ,l) = y"Vi(si)7r4(l|2,a:i) 

XI, X3 XI 

of fighting if forced not to watch. Clearly this is identified from an experi- 
ment that includes Experiments 2 and 3 by summing and division by ?r 2 (2), 
but by no other combination of experiments. Similarly e! = p{\, 1, 1, 1) = 
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7r 2 (1 )7T4 ( 1 1 1 , 1), the probability a man with testosterone levels held low 
watches the movie and fights, is identified from p(l, 1, 1, 1) obtained from 
Experiment 1 and 2 by division. 

The movie example falls within the general scheme of Section 2J Of 
course a graphical representation of the movie example, e.g. over a tree 
or even a BN, is possible and useful. But one of the point of this paper 
is to show that when discussing causal modelling the first step does not 
need to be the elicitation of a graphical structure whose geometry can 
then be examined through its underlying algebra. Rather an algebraic 
formulation based on the identification of the circumstances of interest, 
e.g. the set V, and the elicitation of a causal order, e.g. the partial order 
on V, is a more naturally starting point. Clearly in such framework on 
one hand the graphical type of symmetries embedded and easily visualised 
on e.g. a BN are not immediately available but they can be retrieved (for 
an example involving CEG and BN see [IS]). On the other hand algebraic 
type of symmetries might be easily spotted and be exploited in the relevant 
computations. 

In this example computation was simple algebraic operation while in 
more complex case we might need to recur to a computer. Of course the 
usual difficulties of using current computer code for elimination problems 
of this kind remain, because inequality constraints are not currently inte- 
grated into software and because of the high number of primitive probabil- 
ities involved. Caveats in Section [3] for BN's, like the advantages of ad-hoc 
parametrizations, apply to these structures based on trees and/or defined 
algebraically. 
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